HEAD ======= >>>>>>> 4c2ae9e7b2156a448c2186f402659f8c5a440bc4
Statistical computation and visualization (MATH-517)
Wildfires are uncontrolled fires that burn in the wildland vegetation, often in rural areas. They are not limited to a particular continent or environment, and burned different kinds of ecosystems for hundreds of millions of years on Earth (André Gabrielli (2019)). The problem of wildfires is at the stake all over the world, along with the topic of climate change and preservation of nature and ecosystems. There has been several major cases of wildfires recently, growing in number and severity: for instance, the California wildfires in 2020 became one of the largest wildfire season in the California history (Holly Yan, Cheri Mossburg, Artemis Moshtaghian and Paul Vercammen (2020)) with several millions of acres burnt (Topher Gauk-Roger, Stella Chan, Jason Hanna and Steve Almasy (2020)). Also, Turkey went through the worst wildfire season of the country in July and August 2021 (Mert Ozkan and Ezgi Erkoyun (2021)), and the 2019-2020 bushfires in Australia (also known as the Black Summer) killed several billions of animals. A lot of them were endangered species, which some were believed to be driven to extinction from this incidence (Michael Slezak (2020)). Therefore, a lot of countries aim at minimizing the size and the number of occurrences of wildfires, since they can be the cause of many direct and indirect fatalities in humans (Steven Reinberg (2021)), as well as air pollution (Sarah Gibbens (2021)) and the loss of ecosystems and biodiversity (the case of Black Summer).
In order to reduce the number and the severity of wildfires, understanding the main factors at the origin of the catastrophy is necessary. The incidences are very often caused accidentally (burning debris, agricultural activities, campfires, smoking), or intentionally (arson, children). Although the latter case can be prevented, human or non-human accidents can always happen and they are hard to predict ((Wildfire Causes, n.d.)). However, we can suspect that there are certain natural conditions that make those accidents easier to happen and to grow them bigger in size. By identifying them, the states can build a strategy to efficiently suppress wildfire once it happens, and get prepared to fight against for locations that are highly possible to catch fire at a certain period of the year.
In addition, those factors change as the time goes by, and they generate a better or worse conditions for wildfires to happen. For instance, global warming is highly suspected to be one of the main reasons why the wildfires were more recurrent in the recent days (Alejandra Borunda (2021)). Countries have been undergoing climate changes, and those unexpected events can be the seed of the recent disasters.
=======Wildfires are uncontrolled fires that burn in the wildland vegetation, often in rural areas. They are not limited to a particular continent or environment, and burned different kinds of ecosystems for hundreds of millions of years on Earth (André Gabrielli 2019). The problem of wildfires is at the stake all over the world, along with the topic of climate change and preservation of nature and ecosystems. There has been several major cases of wildfires recently, growing in number and severity: for instance, the California wildfires in 2020 became one of the largest wildfire season in the California history (Holly Yan, Cheri Mossburg, Artemis Moshtaghian and Paul Vercammen 2020) with several millions of acres burnt (Topher Gauk-Roger, Stella Chan, Jason Hanna and Steve Almasy 2020). Also, Turkey went through the worst wildfire season of the country in July and August 2021 (Mert Ozkan and Ezgi Erkoyun 2021), and the 2019-2020 bushfires in Australia (also known as the Black Summer) killed several billions of animals. A lot of them were endangered species, which some were believed to be driven to extinction from this incidence (Michael Slezak 2020). Therefore, a lot of countries aim at minimizing the size and the number of occurrences of wildfires, since they can be the cause of many direct and indirect fatalities in humans (Steven Reinberg 2021), as well as air pollution (Sarah Gibbens 2021) and the loss of ecosystems and biodiversity (the case of Black Summer).
In order to reduce the number and the severity of wildfires, understanding the main factors at the origin of the catastrophy is necessary. The incidences are very often caused accidentally (burning debris, agricultural activities, campfires, smoking), or intentionally (arson, children). Although the latter case can be prevented, human or non-human accidents can always happen and they are hard to predict (Wildfire Causes, n.d.). However, we can suspect that there are certain natural conditions that make those accidents easier to happen and to grow them bigger in size. By identifying them, the states can build a strategy to efficiently suppress wildfire once it happens, and get prepared to fight against for locations that are highly possible to catch fire at a certain period of the year.
In addition, those factors change as the time goes by, and they generate a better or worse conditions for wildfires to happen. For instance, global warming is highly suspected to be one of the main reasons why the wildfires were more recurrent in the recent days (Alejandra Borunda 2021). Countries have been undergoing climate changes, and those unexpected events can be the seed of the recent disasters.
>>>>>>> 4c2ae9e7b2156a448c2186f402659f8c5a440bc4The purpose of this investigation is to give an answer to the following question:
What are the main factors that affect the propagation of wildfires within the United States?
An investigation will be conducted and consists in answering these subquestions:
How have the number of fires in the United States of America evolved with time from 1993 to 2015?
How do the land covers vary over time?
How are fires distributed across the land covers and meteorological factors?
To answer the above questions we will proceed as follows. A descriptive and visual approach where interactive plots will be produced with information on different dimensions: geographically, temporally and by different factors. Afterwards, the distribution of land covers will be studied. Also, the analysis of the variation of land covers in the same location will be carried because of the changes that have been noticed in some areas. We will then proceed with the analysis of the number of fires. To do so, the analysis of the land covers and meteorological parameters will be conducted. After noticing that the correlation between parameters was low, we used subsets and performed a quantile regression.
To perform this analysis, a dataset for the United States from 1993 to 2015 will be used. It contains 563,983 rows with 37 columns. The columns are the following:
Please note that the area proportions \(lc1\) to \(lc18\) do not always sum to exactly 1 for each pixel and month since a few classes with quasi-0 proportion have been removed.
Since the original data was given under the context of a prediction competition with the University of Edimburgh, there is a 8,000 of missing values in each of the \(CNT\) and \(BA\) columns. The missing values are not located necessarily in the same lines for the two features.
When considering only rows without missing values, 452,930 rows remain.
| Statistic | Min | Pctl(25) | Median | Pctl(75) | Mean | Max |
| CNT | 0 | 0 | 0 | 2 | 2.280 | 359 |
| BA | 0 | 0 | 0 | 1.6 | 158.898 | 538,054 |
| sum_of_lcs | 0.822 | 0.997 | 0.999 | 1.000 | 0.997 | 1.000 |
As shown in the Table (1), wildfires remain relatively rare events. More than 75% of the locations considered have less than two fires per month when looking at the feature \(CNT\). Same applies for the feature \(BA\) representing aggregated burnt area, where the distribution is strongly positively skewed.
As stated in the data description, the proportions of the 18 land covers do not always add up to one. Looking at Figure 1 and in Table (1), we can see that the minimum value for the sum is 0.82. It is also seen from the 1st Quantile value that only 25% of the data has a sum below approx 0.99. We therefore continue with the data considering it is close enough.
Figure 1: Histogram representing the sum of land covers by row from 1993 to 2015. The histogram is negatively skewed
Figure 2: Distribution of land covers from 1993 to 2015
In Figure 2 is displayed the distribution of land covers using Boxplot. We can see that most of the land cover represent less than 10% of the the location considered, this shows that the area considered are diverse.
Some transformations on the features were made: first the temperature was converted from Kelvin to Celsius. Next, the U-component of wind (the wind speed in Eastern direction) and V-component of wind (the wind speed in Northern direction) were aggregated using the euclidian norm of the vector: \[W\hspace{-2pt}speed=\sqrt{{W\hspace{-2pt}speed_{East}}^2 + {W\hspace{-2pt}speed_{North}}^2 }\] with \(W\hspace{-2pt}speed\) the wind speed.
In order to do a descriptive analysis of the data and before exploring the different factors, we first plot on the map the number of cases of wildfires (denoted as \(CNT\)) as well as the burnt area (denoted as \(BA\)) from it with respect to time. This was made with the objective of determining which states are the most affected by wildfires and identifying the time when the fires happen the most.
The given dataset had a list of different coordinates in the United States. To determine which coordinates belong to which state, the python library \(\it{reverse\_geocoder}\) (link) was used. This library gives the closest address given the coordinates. With this, we proceed by extracting the name of the state, added up the numbers and stored in a dictionary for each state. \(CNT_i\) or \(BA_i\) stand for the values of the dictionary for i a state in the U.S.A . Let us also denote \(CNT_k\) or \(BA_k\) the value for k a given coordinate.
To better visualize the data, several adjustments have been made. First, the number of incidences and the burnt area were divided by the total area of the states to make a comparison. Then this value was multiplied by \(10^5\) (for \(CNT\)) or \(10^4\) (for \(BA\)) in order to get the number of wildfires/burnt area of the state per \(10^4km^2\) or \(10^5km^2\) respectively. Also, we realized that the obtained numbers could go from the order of \(10^{-2}\) to \(10^{3}\). In order to have a reasonable color scale for each state, the log scale was applied. The final numbers for the plots are calculated as follows:
\[Final\_CNT_i=log_2\left(\frac{10^5\left(\sum\limits_{k=coordinate\_0}^{total\_number\_of\_coordinates\_in\_i}CNT_k\right)}{total\_area\_of\_i}+1\right)\] \[Final\_BA_i=log_2\left(\frac{10^4\left(\sum\limits_{k=coordinate\_0}^{total\_number\_of\_coordinates\_in\_i}BA_k\right)}{total\_area\_of\_i}+1\right)\] for i a state in the U.S.A.
The numbers close to 0 are trivial, hence 1 is added before taking the log to avoid having meaningless outliers with the scale starting with big negative numbers.
In addition, we plotted the number of incidences/burnt area for each location in red scatter points with a size scale. To adjust the numbers, the log scale was again applied. The numbers were obtained as follows:
\[Local\_CNT_k=4log_2\left(CNT_k+1\right)\]
\[Local\_BA_k=2log_2\left(BA_k+1\right)\]
for k a given coordinate of the dataset.
Several interactive maps in python with the chosen scaling methods was made, but due to technical issues1, deploying the maps with an external link was not possible. However, running the interactive maps on the file \(\it Visualisation\_general.ipynb\) and \(\it Visualisation\_specific.ipynb\) on local servers is still possible, provided that the needed libraries are installed. Figures 3, 5, 6, 7, 8 are animated plots of the interactive map with respect to different time frame, and the mode (\(CNT\) or \(BA\))
Figure 3 is an overview of one option that can be chosen for the plot. The color scale shows the burnt area for each states, and the circle scatter plot shows the number of incidences for each coordinates.